Skip to content

Add CHEDDAR GPU backend/emitter#2896

Open
AlexanderViand wants to merge 2 commits into
google:mainfrom
AlexanderViand:cheddar-backend
Open

Add CHEDDAR GPU backend/emitter#2896
AlexanderViand wants to merge 2 commits into
google:mainfrom
AlexanderViand:cheddar-backend

Conversation

@AlexanderViand
Copy link
Copy Markdown
Collaborator

@AlexanderViand AlexanderViand commented Apr 23, 2026

Part of the work to make Cheddar a first-class backend, extracted to its own PR because the diff was getting out of hand :)

This adds the backend itself, i.e., the cheddar dialect, --scheme-to-cheddar passes, emitter, etc and a few basic tests (including some very simple "e2e" ones). It also adds Cheddar itself as an optional (default) off Bazel dependency.

Question: Do we want to run the e2e GPU tests on CI? If yes, what runner/etc should we be using?

This PR does not yet enable HEIR to target more complex programs to Cheddar, because Cheddar has very strict ideas about the acceptable scales of things, and with just this PR, HEIR's mgmt layer still has no idea about the Rational-Rescale-ish "25-30" system Cheddar uses.

@j2kun j2kun self-requested a review May 4, 2026 16:19
Copy link
Copy Markdown
Collaborator

@j2kun j2kun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will review the IR parts after office hours, but one major request below

Comment thread MODULE.bazel
)

# CHEDDAR GPU FHE library (opt-in, requires CUDA)
cheddar_extensions = use_extension("//bazel:extensions.bzl", "cheddar_deps")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there's a lot of stuff in this PR around adding the cheddar backend (bazel overlay, opt-in config, etc.) but as far as I can tell nothing tests it because there are no end-to-end tests in this PR.

Can we either include an end-to-end test in this PR, or else have this PR only include the MLIR-only parts? (dialect, transforms, lit tests, emitter).

In particular, when I pull this branch to my local machine and I try running bazel build @cheddar, I get

$ bazel build @cheddar  
ERROR: /usr/local/google/home/jkun/.cache/bazel/_bazel_jkun/886f3804b421b5442165b2f8eb57dad5/external/rules_cuda++toolchain+cuda/BUILD: no such target '@@rules_cuda++toolchain+cuda//:thrust': target 'thrust' not declared in package '' defined by /usr/local/google/home/jkun/.cache/bazel/_bazel_jkun/886f3804b421b5442165b2f8eb57dad5/external/rules_cuda++toolchain+cuda/BUILD
ERROR: /usr/local/google/home/jkun/.cache/bazel/_bazel_jkun/886f3804b421b5442165b2f8eb57dad5/external/+cheddar_deps+cheddar/BUILD.bazel:50:11: no such target '@@rules_cuda++toolchain+cuda//:thrust': target 'thrust' not declared in package '' defined by /usr/local/google/home/jkun/.cache/bazel/_bazel_jkun/886f3804b421b5442165b2f8eb57dad5/external/rules_cuda++toolchain+cuda/BUILD and referenced by '@@+cheddar_deps+cheddar//:cheddar'
ERROR: Analysis of target '@@+cheddar_deps+cheddar//:cheddar' failed; build aborted: Analysis failed
INFO: Elapsed time: 1.340s, Critical Path: 0.02s
INFO: 1 process: 1 internal.
ERROR: Build did NOT complete successfully

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@missing tests: indeed, that's an oopsie on my side while extracting this out of the overall cheddar WIP branch. As we discussed in the office hour yesterday, I'll split this PR into three

  1. Dialect
  2. Emitter and FileCheck tests (I think I'll re-use this PR for this, to preserve the relevant review comments)
  3. Bazel dependency & end-to-end tests

@ local machine: Does your local machine have an Nvidia GPU with CUDA/etc installed? Otherwise, I'm pretty sure that this is an expected failure, due to @rules_cuda being unable to find the required CUDA related stuff? Maybe there's a way to set up the BUILD file to get a nicer/more helpful error message? Since you're directly asking bezel to build the cheddar target, enable_cheddar isn't "protecting" you here, I guess?

Copy link
Copy Markdown
Collaborator

@j2kun j2kun May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://bazel-contrib.github.io/rules_cuda/0.3.0/

For hermetic toolchains, the rules handle toolchain configuration and library downloading automatically. See cuda.redist_json integration test for a comprehensible example.

For locally installed toolchains, _detect_local_cuda_toolkit and detect_clang determines how they are detected.

Either situation depends on cc toolchain availability, so you must also ensure the cc compiler is properly configured. On Windows, this means that you will also need to set the environment variable BAZEL_VC properly.

I would like to try this again on a machine that has an NVIDIA GPU as well as the hermetic LLVM toolchain. In theory, with the hermetic toolchain we should be able to build the targets even if the machine does not have a GPU or CUDA installed. But yes, we should split that into a standalone PR.

@j2kun
Copy link
Copy Markdown
Collaborator

j2kun commented May 5, 2026

Question: Do we want to run the e2e GPU tests on CI? If yes, what runner/etc should we be using?

A quick look into this and it appears the Google org does not have any runners configured with a GPU. So this may be an obstacle and might force us to set up a Google-internal CI that runs on GCP and mirrors its results to GitHub.

Comment thread lib/Dialect/Cheddar/IR/CheddarAttributes.td Outdated
}];
let parameters = (ins
StringRefParameter<"path to JSON parameter file">:$path,
DefaultValuedParameter<"bool", "false", "use 64-bit word type">:$use64Bit
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field appears to be unused. (In fact, this entire attribute appears to be unused)

Comment thread lib/Dialect/Cheddar/IR/CheddarDialect.td Outdated
let results = (outs Cheddar_Ciphertext:$output);
}

def Cheddar_MadUnsafeOp : Cheddar_Op<"mad_unsafe"> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "mad" and not "mac"?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's (for whatever reason) what Cheddar calls the function: https://github.com/scale-snu/cheddar-fhe/blob/main/include/core/Context.h#L377 and I figure making the exit dialects map closely to the library API is more important than making the naming consistent with other HEIR ops?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Persisting what appears to be a typo...is just the right amount of whimsy. Maybe leave a comment for future confused people.

// `GetRotKeyOp` stores a static distance attribute, but dynamic
// `ckks.rotate` lowering still needs a placeholder key op so the emitter can
// trace back to the `UserInterface`. This sentinel distance marks that case.
constexpr int64_t kDynamicRotationKeyDistanceSentinel = -1;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you don't want to allow the GetRotKeyOp to take both dynamic and static options as well?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this emitter makes me think we should invest some time into converting these emitters to use the emitc dialect, so that the vast majority of the codegen details can be avoided. 😰

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at emitC, it's come a long way since I last used it in HECO, and the explicit variable handling looks really nice! I'm kind of tempted to try rewriting the Cheddar emitter into EmitC.... 😅

Comment on lines +473 to +483
while (currentLevel > targetLevelVal) {
current = cheddar::RescaleOp::create(rewriter, op.getLoc(), cheddarCtType,
ctx.value(), current);
--currentLevel;
auto encodedOne = cheddar::EncodeConstantOp::create(
rewriter, op.getLoc(), cheddar::ConstantType::get(getContext()),
encoder.value(), one, rewriter.getI64IntegerAttr(currentLevel),
rewriter.getI64IntegerAttr(*logDefaultScale));
current =
cheddar::MultConstOp::create(rewriter, op.getLoc(), cheddarCtType,
ctx.value(), current, encodedOne);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment to explain why this needs a loop would be helpful.

// multiplies), so it maps directly onto CHEDDAR's logScale field.
int64_t logScale = invEncoding.getScalingFactor();

// lwe.rlwe_encode doesn't carry a plaintext level on main, so encode at
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has a RingAttr doesn't it? Isn't that support to encode the level via its RNS coefficient type? We might not instantiate those properly at the moment, but I know for sure the level analysis properly determines the needed levels of plaintexts involved in Pt-Ct ops, cf. encodeCleartextAsPlaintext. If there is a gap here please find/file the right GH issue and link it here with a TODO

target.addLegalDialect<cheddar::CheddarDialect>();
target.addIllegalDialect<ckks::CKKSDialect>();
target
.addIllegalOp<lwe::RLWEEncryptOp, lwe::RLWEDecryptOp, lwe::RLWEEncodeOp,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make the LWE dialect illegal wholesale? Cheddar has its own types, after all.

return failure();
}

// Scaling factors on CKKS encodings are stored log-additively on main (a
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comments talking about "on main" will become strange once they are merged into main.

Co-authored-by: Jeremy Kun <kun.jeremy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants